Applications

KnowledgeMiner
Various Applications and Examples
Demonstrations of the KnowledgeMiner Advantage

|Home| |Knowledge Discovery| |Publications| |Paper 1| |Paper 2| |Software| |Examples| |Registration|

Example Applications

The application field of self-organizing modeling is decision support in economy (analysis and prediction of economical systems, market, sales and financial predictions, balance sheet prediction), ecology (analysis and prediction of ecological processes like air and soil temperature, air and water pollution, growth of wheat, drainage flow, Cl- and NO₃-settlement, influence of natural position factors on harvest) and in all other fields with only small a priori knowledge about the system. The advantages of self-organizing modeling over the Neural Network approach as well as over statistics is that it works very fast, systematically and objectively since only minimal a priori information (pre-definitions) is required, it provides explicit models as an explanation component while making hidden knowledge visible and usable and the obtained results are in average as good as or better than results of other modeling techniques. Some times self-organizing modeling is the only way to get results for a problem at all.

The following examples are included in the downloadable KnowledgeMiner package.

National Economy

This example shows the prediction of 13 important characteristics of a national economy 3 years ahead along with their corresponding models.

Given are 27 yearly observations (1960 - 1986) of 13 variables like Gross Domestic Product, Unemployed Persons, Savings, Cash Circulation, Personal Consumption, State Consumption. Using this data set (13 columns, 27 rows) and a chosen system dynamic of up to 3 years, for each variable the invisible and normalized information basis for self-organization of a linear system of equations is constructed automatically. This means, for instance, that for x₁ a model will be created out of this information basis:

x_1,t= f(x_2,t, x_3,t, ... , x_13,t, x_1,t-1, x_2,t-1, ... , x_13,t-1, x_1,t-2, ... , x_13,t-3).

The information basis has this dimension: 51 input variables (columns) and 24 observations (rows). The task of the modeling process is now to evolve a specific instance of the function f considering a number of requirements to end up in a robust, optimal complex model. This is done not only for x₁ but for all 13 variables automatically while avoiding conflicts between the unlagged variables. The result is an optimal, autonomous system of equations that can predict all 13 variables whithin one process and which is visible through a system graph.

Balance Sheet

Balance sheet analysis and prediction is an important part of the fundamental analysis in finance. Reliable information about status and evolution of a company is a key factor for success in that area. A lot of this information is contained in the data of balance sheet characteristics and the overall market and macroeconomic data (see National Economy). A problem is the large amount of variables, the very small number of observations for balance sheets and the unknown relationships and dynamics between these variables. Here, statistics as well as Neural Networks are practically not applicable. GMDH is.

In the balance sheet example shown here only balance sheet characteristics themselves are used to predict them 1 year ahead. Given are 13 characteristics for 7 years (1986 - 1992). Again, a dynamic, linear system of equations was self-organized as described above. The average percentage error of the 1993 prediction was 16%.

COD concentration

This example describes a water pollution problem [Farlow, 1984]. COD stands for Chemical Oxygen Demand and is used as a proxy to measure water pollution. One difficulty here is that only a few characteristics like water temperature, salt concentration or transparency are measureable and that these measurements are very noisy. Many attempts were made to develop a mathematical model to predict COD levels for control of water pollution in compliance with the standards.

Usually, the behavior of COD in a bay is calculated by the nonreaction-diffusion model. However, this model has some significant defects which are the reason for seeking alternative methods. One way for predicting COD provides GMDH. Using 6 characteristics with 40 monthly observations it is possible by creating a system of equations to predict COD 5 month ahead with satisfactory results. The obtained system graph is shown exemplary in figure below.

System of equations obtained for the COD prediction problem

Flats (Best Buy Apartments)

Another kind of problem presents this example. It reflects a market analysis task searching for friendly or costly flat rates out of a given number of comparable objects. Given are 6 characteristics as input variables (location, type, requested equipment, extra equipment, number of rooms and m²/room) which are obtained by matrices of several, subjectively ranked subcriteria and which are expected to have influence on the variable of interest, rate/m², in some way. In this case static linear and nonlinear models were created using the characteristics of 30 flats. Since the obtained models reflect different relationships they were combined to have a more robust decision basis. See the live example to get an idea on how this works.

Stock indexes - prediction of Dow Jones and S&P500 at the NYSE using daily close prices

Trading currencies, international stocks and derivatives contracts play an increasing role for many investors. Decision making in this field of financing demands tools which are able to generate a trading signal on the basis of predictions. Financial objects are complex ill-defined systems which can be characterized by

inadequate a priori information
great number of unmeasurable variables
noisy and extremely short data samples
ill-defined objects with fuzzy characteristics.

For these objects is only minimal a priori knowledge or no definite theory on hand. In these cases knowledge extraction from data, i.e. to derive a model from experimental measurements, has advantages over traditional deductive logical-mathematical methods. Therefore, knowledge-based activities in modeling such objects must be supported by methods of self-organizing modeling.

The example contains 150 daily close prices of the Dow Jones indexes, low, high and close of the S&P500 index and a volume index at the New York Stock Exchange. On these data a linear system of equtions was created automatically for a chosen system dynamic of up to 5. Data transformation, data subdivision, creation and validation of Active Neurons, selection of surviving neurons, network structure synthesis and avoidance of conflicts in system of equations is performed sytematically and automatically by KnowledgeMiner. All models are stored in a modelbase and are applicable immediately within KnowledgeMiner. There is no need to export models as C code and implement that code into other programs to run the models.

The obtained percentage errors for a 5-day prediction on unseen data is shown in the table below.

Forecast Horizon	DJIA	DJTA	NYVD (000)	DJBA	DJUA	SP500H	SP500L	SP500C
1 day ahead	0.52	1.79	25.20	0.02	0.98	0.14	0.10	0.29
2 days ahead	1.02	2.19	13.30	0.04	1.75	0.36	0.23	0.44
3 days ahead	0.88	2.13	3.78	0.10	1.79	0.18	0.32	0.50
4 days ahead	0.33	1.71	18.89	0.23	2.10	0.49	0.39	0.09
5 days ahead	0.36	1.82	1.38	0.34	2.40	0.26	0.40	0.16

mean absolute %Err	0.62	1.93	12.51	0.15	1.80	0.28	0.29	0.30

Absolute percentage errors of 5-day out-of-sample predictions obtained by a system of equations created with KnowledgeMiner

Model graph and 5-day out-of-sample prediction of the S&P500C in comparision to the true values

The underlying models are simple, robust and accurate. For the SP500C, for instance, this analytical model was created:

SP500C = 12.8727 + 0.0026DJIA + 0.7906SP500L + 0.0237SP500L(t-1) + 0.00001NYVD(t-1)

- 0.0060DJTA(t-5) - 0.6527SP500C(t-1) + 0.8036SP500H.

To find a comparable statistical model a statistician would have to do a hard work since this model structure was not given a priori. KnowledgeMiner evolve this relationship by itself whithin minutes while saving your time to get other things done. In distinction to Neural Networks KnowledgeMiner has at least the advantage to make the extracted knowledge visible and really usable without additional efforts but in most cases also to get more robust and more accurate models in a shorter time. In this way you get better results while saving your resources.

The example file contains the models along with the data so that the results can easily be reviewed.

frank_lemke@magicvillage.de

julian@sierra.net